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[57] ABSTRACT 

A digital video encoder encodes a video frame into a 
differential video frame for transmission over a packet 



switched network. The video encoder includes an inters 
frame encoder, an intra-frame encoder/and an encoding 
selector for selecting between the inter-frame and intra- 
frame encoder depending on the relative motion be- 
tween the video frame being encoded and the previous 
video frame. 

A composite frame combiner provides a composite 
tntra/inter-frame encoded difference frame having one 
set of pixels encoded by the inter-frame encoder, and 
another set of pixels encoded by the intra-frame en- 
coder. The set of intra-frame encoded pixels includes at 
least one square or rectangular pixel block, a vertical 
strip of pixel blocks, and a horizontal strip of pixel 
blocks. 

Difference frames are encoded into separable data sets 
representing video information within a particular 
range of image resolution. A discrete cosine transform 
(DCT) is used to transform trie difference pixels into 
corresponding DCT coefficients which are separable, 
by resolution, into the data sets providing coefficient 
layers. 

A packetizer formats the data sets into asynchronous 
transfer mode (ATM) packets for transmission over 
network. 
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TABLE 1 
INTRA -FRAME CODER 



FIG. 


CHARACTERISTICS 


S/N 
RATIO dB 


BIT RATE 
bpp 


11(A) 


LOW RES. (NO CELL LOSS) 


29.26 


0.49 


1 1 \oi 


MED RES (NO CELL LOSS) 


34 12 


0 96 


11(C) 


MED-HIGH RES (NO CELL 
LOSS) 


39.27 


1.68 


HID) 


HIGH RES. (NO CELL LOSS) 


43.32 


2.38 


13(A) 


HIGH RES. (WITH 57. CELL 
LOSS IN ALL LAYERS 
EXCEPT THE LOW- RES) 


40.34 


2.38 


13(B) 


HIGH RES. (WITH 57. LOSS 
IN LOW-RES. LAYER ONLY) 


33.34 


2.38 



FIG. I4A 



TABLE II 

COMBINED INTER -INTRA- FRAME CODER 



FIG. 


CHARACTERISTICS 


S/N 
RATIO dB 


BIT RATE 
bpp 
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LOW RES. (NO CELL LOSS) 


30.09 


0.49 


12(B) 


MED. RES. (NO CELL LOSS) 


33.67 


0.96 
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MED-HIGH RES. (NO CELL 
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33.17 


1.68 
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40.93 
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HIGH RES. (WITH 57. CELL 
LOSS IN ALL LAYERS 
EXCEPT THE LOW- RES) 


39.51 


2.38 


13(D) 


HIGH RES. (WITH 57. LOSS 
IN LOW-RES. LAYER ONLY) 


36.80 


2.38 



FIG. I4B 



08/08/2003, EAST Version: 1.04.0000 



LAYERED DCT VIDEO CODER FOR PACKET 
SWITCHED ATM NETWORKS 

BACKGROUND OF THE INVENTION 

— This invention relates to systems for transmission and 
reception of digital video over packet switched digital 
networks. 

Packet switched digital networks are used to transfer 
packets of digitized data among users coupled to the 10 
network. Data to be sent over the network, from a 
source device to a receiving device, is typically com- 
bined with a header containing an address to form a 
data packet for transmission. The address portion of the 
packet directs the packet to the desired receiving device 15 
on the network to establish a virtual communications 
channel, or digital end-to-end connection, between the 
source device and the receiving device. The data por- 
tion of the packet is formatted to correspond to the 
requirements of any one or more of the services avail- 20 
able on the network. 

Traditionally, networks were customized to specifi- 
cally accommodate only certain types of services, 
which lead to a variety of incompatible, service-ori- 
ented communications networks. In recent years, the 25 
CCITT has adopted a series of standard multi-purpose 
user network interfaces (UNI) for an integrated services 
digital network (ISDN) supporting a wide range of 
voice and non-voice services over a digital end-to-end 
connection. The CCITT has considered a number of 30 
broadband network interface approaches supporting a 
wide range of data, voice, and video services, including 
Asynchronous Transfer Mode (ATM), which stands 
out among the other approaches. ATM is a connection 
oriented transfer technique, where fixed-size ATM data 35 
packets (cells) from different sources are asynchro- 
nously multiplexed onto a communications channel, 
providing efficiencies in bandwidth utilization and allo- 
cation of variable bandwidths to different services. 

Video services vary greatly in their bandwidth and 40 
image resolution requirements. For instance, video tele- 
phone service requires relatively little bandwidth com- 
pared to the bandwidth requirements for high definition 
television (HDTV) services. Data compression is neces- 
sary to minimize the bandwidth requirements for all 45 
video services on a network, especially where network 
congestion is anticipated. Furthermore, the differing 
image resolution requirements of the various video 
services creates a compatibility issue among services 
desiring to use the same video information transmitted 50 
on the network. A compatible video encoding scheme is 
necessary to provide data compatibility among the 
video services. 

Packet switched networks may experience data loss, 
due to, for instance, data buffer overflows or errors in 55 
the packet headers. Data loss typically affects the video 
quality of the transmitted images in various ways, de- 
pending on how the video image is encoded and pack- 
aged into ATM data packets. Robust video encoding 
techniques are required to minimize the effect of lost 60 
data on the quality of video images sent across the net- 
work. 

SUMMARY OF THE INVENTION 

In general, in one aspect this invention features an 65 
apparatus and a method for encoding a digital video 
frame into a differential video frame for transmission 
over a digital communications channel such as a packet 



0,783 

2 

switched network. An inter-frame encoder encodes a 
pixel of the video frame into a corresponding differen- 
tially encoded pixel of the difference frame dependent 
on another, previous in time, video frame. An intra- 
5 frame encoder encodes a pixel of the video frame into a 
- corresponding differentially encoded pixel of the differ- - 
ence frame dependent on other pixels within the same 
video frame. An encoding selector selects between ei- 
ther the inter-frame encoder or the intra-frame encoder 
for encoding the pixels of the video frame dependent on 
the relative motion between the video frame being en- 
coded and the previous video frame. 

Preferred embodiments include an encoding selector 
having a motion detector for detecting the relative 
motion between the video frame to be encoded and the 
previous video frame. The motion detector provides a 
decision parameter K representing the level of relative 
detected motion. The encoding selector compares K 
against a threshold parameter T and selects the inter- 
frame encoder when K<T and the intra-frame encoder 
whenK^T. 

Other preferred embodiments feature a composite 
frame combiner providing a composite intra/inter- 
frame encoded difference frame having one set of pixels 
encoded by the inter-frame encoder, and another set of 
pixels encoded by the intra-frame encoder. The set of 
intra-frame encoded pixels includes at least one square 
pixel block, a vertical strip of pixel blocks, and a hori- 
zontal strip of pixel blocks. Preferred embodiments 
include horizontally offsetting the vertical strip position 
by at least one pixel block width from one frame to the 
next, or similarly vertically offsetting the horizontal 
strip position. 

Yet other preferred embodiments feature a layered 
resolution encoder for encoding the pixels of the differ- 
ence frame into separable data sets each representing 
video information within a particular range of image 
resolution. A discrete cosine transform (DCT) is used to 
transform the difference pixels into corresponding DCT 
coefficients which are separable, by resolution, into the 
data sets providing coefficient layers. The DCT is per- 
formed on square blocks of pixels within the difference 
frame to provide corresponding blocks of DCT coeffi- 
cients. Each block of DCT coefficients is separated into 
coefficient layers. 

Still other preferred embodiments include a packe- 
tizer for formatting the video image data sets into asyn- 
chronous transfer mode (ATM) packets for transmis- 
sion over the digital communications channel. The 
ATM packets include a header field having data for 
establishing a virtual communications channel between 
selected devices on the digital communications channel, 
and an information field for transferring the data sets 
between the selected devices. The information field 
includes an adaptation overhead field having a cell 
sequence number, and a sync flag. The logical state of 
the sync flag indicates the composition of the remainder 
of the information field. In one case, the remainder of 
the information field includes an adaptation overhead 
portion having a coding mode field, a comp type field, 
a strip location field and a resolution information field. 
In another case, the remainder of the information field 
includes a data field having the DCT coefficient data 
sets. 

The digital video encoding apparatus and method of 
this invention thus provides a flexible system for trans- 
ferring video information among a wide range of video 
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devices on a digital communications network. The lay- tra/mter-frame mode encoding, respectively according 
ered resolution encoding of this invention provides to this invention; 

compatibility among the different video services on the FIGS. Il(a)-ll(c0 are photographs of the image of 
network, while maintaining the video quality of each FIG. 10(a) after experimental decoding using various 
type of service. Layered encoding also provides signifi- 5 resolution layers; 

-cant-data compression and adds robustness to digital— - FIGS; 12(a)-12(<4) are photographs of- the image of- 
packet cell loss. Further, network congestion control is FIG. 10(6) after experimental decoding using various 
provided by dropping cells containing higher resolution resolution layers; 

information without introducing significant visual arti- FIGS. 13(a)- 13(a") are photographs of the images of 
facts. 10 FIGS. 10(0) and 10(6) after experimental decoding sim- 

The composite mtra/inter-fraine encoding mode of dating various cell loss conditions; and 
this invention alleviates the problem of error propaga- FIGS. 14(a) and 14(6) are tables summarizing experi- 
tion through biter-frame encoded video frames by pro- mental performance results obtained with the video 
viding a circulating intra-frame encoded strip which encoder of this invention. 

refreshes a section of each inter-frame encoded frame. 15 Referring now to FIG. 1, a digital communications 
Variable bit rate (VBR) output is generated by exploit- svstem 10 capable of transferring digital video informa- 
ing the redundancy or variation in the information con- 41011 mon B users includes a broadband ISDN (B-ISDN) 
tent from frame-to-frame, and by dropping those layers Protocol network 12, one or more digital video sources 
of video information which have insufficient energy Jf coupled to the network through a compressor 
contcnt 20 18, and one or more video display devices 16 each cou- 

Digital network compatibility is maintained by trans. P led t0 * e network % decompressor 20. Video 

ferring all video information via standard asynchronous """f. 14 frara « of digital video which axe 

transfer mode (ATM) packets. Tbe quantity of over- transferred to a compressor 18. Compressor 18 encodes, 

head information required to be sent with each video „ formats and transmits each frame of digital video onto 

frame is minimized by using fixed-Iength codewords for 25 netwo / k J? " T ° m °J n numerous v,deo d * ta 

each layer of video information, thus requiring video Packets. Decompressor 20 receives the digital video 

synchronization information to be sent only once at the da * * I I™* ° f " dc ° T 

uL;™„~ „r ™i. t- a network, decodes the packetized data, and reconstructs 

beginning of each frame the data into the original digital video frame. Video 

Other advantage ;and ,^1^ ^ 30 display device 16 delays The reconstructed video 
from the fallowing description of the preferred embodi- f ^ Some devices ^ ^ ngtwork combme ft 

ments and from the claims. compressor 18 and a decompressor 20 in the same de- 

DESCRIPTION OF THE PREFERRED ^ce to facilitate operation in a bi-directional mode, e.g., 

EMBODIMENTS videoconferencing, videotelephony, etc.. Other devices 

j . 35 may require only a compressor or a decompressor and 

We first briefly describe the drawings. Q te m a ^^0^ modC( c _ g tv distribution, 

FIG. 1 is a block diagram of a digital communications catalog services etc 

network using the digital video encoder of this inven- Compressor 18 includes a video frame encoder 50 

tl0 °;_ _ . , , . . connected to network 12 by a* network interface 51, and 

FIG. 2 is a block diagram of a combined intra- 40 decompressor 20 includes a video decoder 21 connected 

frame/inter-frame layered DCT video encoder for en- t0 network u by a nctwork interface 22 . Encoder 50 

coding a color video frame for transmission over the m6 decodcr 21 work togcther to transfer a frame of 

digital communications network of FIG. 1; digital vide0 from the mm t0 the disp j av using ehher 

FIG. 3 is a diagram showing a video frame, encoded an i ntr a-frame coding mode, or an inter-frame coding 

by the encoder of FIG. 2, divided into pixel elements, 43 mode (which in C i udes a composite intra/inter-fraroe 

and the pixel elements grouped into a pixel block; coding mode ). in the intra-frame coding mode, a frame 

FIG. 4 is a diagram showing the video frame of FIG. 0 f digital video is encoded for transmission using only 

3 divided into pixel blocks; the video information present in that frame. In the inter- 

FIGS. 5(a)-5(c) are sequential diagrams showing the f rame coding mode, a frame of digital video is encoded 

changing position of an intra-frame mode encoded strip 50 us ing video information present in that frame, as well as 
of pixel blocks moving across inter-frame mode en- video information from previous video frames, usually 

coded pixel blocks of the video frame of FIG. 4; the frame immediately prior in time to the frame being 

FIG. 6 is a diagram showing a layered resolution encoded. In the composite intra/inter-frame coding 
coding model implemented by the video encoder of mode, a frame of digital video is encoded using a combi- 

FIG. 2; 55 nation of the intra-frame and inter-frame coding modes. 

FIG. 7 is a diagram showing the layout of a DCT FIG. 2 shows a block diagram of a digital video frame 
coefficient block, corresponding to the pixel block of encoder 50, according to this invention, for transform- 

FIG. 3, for defining the DCT coefficient layers of the ing a high-bandwidth color video signal into a coded, 

layered resolution coding model of FIG. 6; compressed, and resolution-layered, digital data bit 

FIG. 8 is a diagram showing an alternative layering 60 stream for transmission across network 12 in asynchro- 

definition for the DCT coefficient block of FIG. 7; nous transfer mode (ATM) formatted packets. A stan- 

FIG. 9 is a diagram showing asynchronous transfer dard RGB, YIQ, or NTSC color video signal is applied 
mode (ATM) cell structures for transferring video to the input 52 of domain transformer 54 which trans- 
frame data, encoded by the video encoder of FIG. 2, forms each color video frame of any of these input 
across the digital communications network of FIG. 1. 65 signal types into a digital Y -component video frame on 

FIGS, 10(a)-10(6) are photographs of an original line 56, a digital I-component video frame on line 58 and 

video image encoded by the video encoder of FIG. 2 a digital Q-component video frame on line 60. The I- 

using intra-frame mode encoding and composite in- and Q-component frames are undersampled (or deci- 
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mated) by a factor of two by sampler 62 to produce The Y-component carries the highest bandwidth in- 
reduced resolution I- and Q-component frames on lines formation about the color image, typically at least four 
64 and 66, respectively. times that of either the I- or Q-component. Therefore, 
An encoder 68 compresses the video data by indepen- less pixels are required to sufficiently convey the I and 
dently encoding each of the Y-, I-, and Q-component 5 Q information, than are required to convey the Y infor- 
frames on lines-56,-64, and 68, respectivelyrinto either - matron for the same" lniage7In _ the _ preferred erabodi- 
an intra-frame mode coded difference frame or a com- ment, a single I or Q pixel corresponds to a square block 
posite intra/mter-fhune mode coded difference frame, covering an area of four Y pixels. Thus, for an MxN 
depending on the relative motion between successive (512x312) pixel color image having a Y-component 
video frames of the same component type. Encoder 68 10 frame f» obtaining MxN (512X512) pixels, the I- 
automatically selects between the two encoding modes component frame f), and Q-component frame fp, need 
by comparing the present Y-, I-, and Q-component only contain M/2XN/2 pixels (256x 256) each, with 
frames on lines 56, 64, and 66. respectively, with the each I or Q pixel covering four times the image area of 
corresponding previous Y-, I- and Q-component frames a Y pixel. 

available on lines 70, 72, and 74. i j Transformation of the I- and Q-components from the 

Each of the Y-, I-, and Qs»mponent encoded differ- 512X512 pixel video frame to the 256X256 pixel video 

ence frames output from encoder 68 is passed through a frame is accomplished by subsunpling the I- and Q- 

layered discrete cosine transform (DCT) 76 which sepa- video components by a factor of two, and replacing 

rates the video information present in each difference blocks of four I or Q pixels with a single pixel contain- 
frame into layers of DCT coefficients based upon reso- 20 fog ^ average (or some other measure) of the four 

lution, ranging from a low-resolution DCT layer con- pj, e ls replaced 

tuning basic image information, to a high-resolution transmission efficiency of a particular video 

DCT layer containing detailed image information. A framc encoding mode depends on the character . 

quantizer 78 quantizes the DCT coefficients, represent- i$tic of ,h e vide0 frame ^ traiisin it,cd. A video 

ing the low-resolunon DCT efficients am* a larger 25 frame f ( having little or no motion relative to the previ- 

digital 1 word I size than the high-resolution DCT coeflic- ous frame f is most efficiently encoded by taking ad- 

en t s. An entropy coder 80 encodes the quantizer out- vantage of the t ra , rednndancy ^ween frames 

pute m toacodedb.tstream82su^ mi on , ^ m Snn ation meretl(x ^ 

Uzing. The quantizer outputs are also used to feed back tween these success i V e frames, i.e., using an inter-frame 

the previous Y-, I- andQ-component frame to the en- 30 codin mode . A video fhm)e f< htvin * signiRcaat mo . 

coder 68 onlmes 70, 72, and 74, respectively. This is tion r £ ative t0 the ^ frame f | m ^ efriciently 

accomplished by passmg the quanttzer 78 outputs enco ded by taking advantage of the patial redundancy^ 

i^ttvssszr M> 8 t^Lrbi^ r ,y f -a 5 

Referring to FIG. 3, a frame f, of digital video 40, 35 ^ h /S t USmg an mUa-frame codmg mode 

where i is the index number of the present video frame !„S ^Jf 3/"^?^ ' ^ f ' 

is divided into N horizontal lines (rows) of M (columns ^™ S^^^^S v node ^ f "P 0 "^."" 

video pixels 42 each, i.e., an MxN two dimensional l/fT T i > W f mt f a - fr 1 f me 

array of pixels. A pixel blode 44 is defined here as being ^J^ZtT^A ° f ^ **? 

a square grouping of pixels 42 having P pixels on each 40 ^^JS^£^t w T /""V* ^ *' 

side, i.e.. a Px P two dimensional squire may of pixels. ^JS*Z£?!££ ^ r frame < . baS f - , 

It should be noted that the pixel block may be rectangu- t J 0 J e TT 9 . 7 r ^ *!2* f 'V obt en " 

lar as well. For convenience. P. M. and N are chosen so W w j h *c »tra-frame coding mode or the compos- 

that M and N are both integer multiples of P, which f, Tt^T Ta f^ 0 ?. mode ' » *fference frame 

produces an integer number of non-overlapping pixel 45 ^^.T T 1 ?°i? .l^' ^ ^ 

blocks 44 contained in each video frame 40. In the pre- p /!^ ous f '-' b ? ^ ,he difference between the 

ferred embodiment, N equals 512, M equals 512, and p ? bs0,U,C pl * d ^ ue ^^ponding pixels in these 

equals 16. This produces a convenient square frame of ■"""!"" VB frames i d, ? erence frame Af ' con - 

digital video having 32 pixel blocks on each side for a BSts of d,nerence P» els AxKm,n) computed by 

total of 1,024 blocks in each frame as shown in FIG. 4. 50 . , ,. 

color video image Each pixel of f n represents only the 
Y-component for the corresponding full color pixel, 

each pixel of f// represents only the I-component for the I bxfayj (2) 

corresponding full color pixel, and each pixel of f#f 60 ^ 1 
represents only the Q-component for the corresponding 

full color pixel. The color video image is reconstructed and * 6X1 es*»niate of the variance is computed by 
by inverse transformirig video frames fy/, f/,and fg/into 

and B-coraponent frames f/i, fc and (b, respec- , ^ J _ " (3) 

tively, which are then overlaid on each other. Unless 65 m j= i " 
otherwise specified, the following references to video 

frame f/ refer equivalentiy to each ; component video where Ax^) is a short notation for the J randomly se- 

frame type fyi, Uu or f^(or f/? f fc or fa). lected pixels Ax^m.n). 
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Next, a decision parameter K, chosen to be one mea- combining inter-frame coded pixel blocks 104, making 
sure of the amount of motion between frames f/.j and f/, up a majority of the pixel blocks in the frame, with a 
is computed from the mean Axjand variance <r 2 of the portion 102 of intra-frame coded pixel blocks 106. In the 
difference pixels AxXm,n) by preferred embodiment, portion 102 is a vertical strip 

5^ having a column width equivalent to the block width 

" where kl and i8 ^weighting coefficients each having it^^Vi^S^ f** ^ f 

a value between 0 and 1 such that ki+kj-1. A thresh! iK^L 1 ^ vary T 

old parameter T is chosen such that for K <T, an inter- ™«»diment to another, e.g., portion 102 may take the 

frame coding mode is selected, and for Ki=T, an intra- 10 f °™ of a *n* ^ d iagoaaJ strip, multiple 

frame coding mode is selected. Variance cr*is an impor- 10 Bt ?^> or randomly selected blocks. Thus, portion 102 

tant parameter in measuring relative motion for deter- of vldeo dlfferci K* frame Af, contains only intra-frame 

mining coding mode because the mean Ax/is often close mode encoded pixel blocks 106, and the remainder of 

to zero and the variance always yields a positive value. d t lfference frame Af/ outside portion 102 contains 

The sensitivity of the overall video quality to cell loss 15 on!v m t cr - fram c coded pixel blocks 104. 

and data corruption depends to a large extent on the The intra-fraine mode coded blocks 106 of portion 

percentage of video frames encoded with each of the 102 m encoded using one of the intra-frame mode 

coding modes, i.e., inter-frame mode encoded video encoders discussed above, and are encoded independent 

frames are more sensitive to cell loss and data corrup- °f da to from another frame or data from the pixel 

tion than intra-frame mode encoded frames because ^ blocks outside the boundary of portion 102. Thus, in- 

errors propagate through reconstructed inter-frame tra/inter-frame mode encoded data contains all the data 

mode encoded frames. Independent selection of the necessary to reconstruct the pixel blocks 106 of portion 

threshold parameter T allows flexibility in adjusting the 102 without knowledge of previous frame data, and all 

encoding percentages, and thereby the system perfor- the data necessary to reconstruct the pixel blocks 104 

mance. 25 outside portion 102 with knowledge of previous frame 

If the decision is to encode a video frame f/ in the data, 
intra-frame mode, an intra-frame mode coded differ- Referring now also to FIGS. 5(b) and 5(c), vertical 
erice frame Af,- is generated. A preferred method for strip portion 102, defining the boundary of intra-frame 
generating an intra-frame coded difference frame Af/ mode encoded pixel blocks within difference frame Af/ 
requires dividing the video frame f/into vertical strips of 3Q 100, is advanced to the right in a sequential and non- 
equal width, and applying a previous pixel predictor overlapping manner every time a new frame is encoded 
independently within each strip to generate difference using the composite intra/inter-frame mode encoding, 
pixels Ax/(m,n) from the video frame pixels x/(m,n) Starting from the leftmost edge of the difference frame 
within the strip. Each difference pixel Ax/(m,n) of pixel Af/ 100 shown in FIG. 5(a), vertical strip portion 102 is 
frame Af, represents the difference between the actual 35 advanced one block column (i.e., P pixels) to the right 
pixel value xXm.n) and a corresponding predicted pixel after one frame time to define a new vertical strip por- 
value x/Xm.n), where the predicted pixel value is the tion 102' at the position shown in difference frame 
actual value of the previous pixel along the same n'* line Af/+ 1 100' of FIG. 5(6). FIG. 5(c) shows the vertical 
of pixels within the same pixel strip. Mathematically, strip portion 102" positioned at the rightmost block 
tins is given by ^ column of difference frame Af/+z. 100" L frame periods 

v* m ^ i H > r<, after the first difference frame Af/, where L=(M/P)-1. 

XAm,n)=x*n- 1,n> (5 ) Aftef L fnmQ ^ cyde ^ 

and. shown in FIGS. 5(a), $(b) and 5(c), in a video sequence 

whose frame resolution is 512X512 pixels (Le., M=N 

A*Xm,n)=x,(mji)-x/Wi) (6) 45 =512 for Afy,), having a pixel block width P= 16 pixels, 

Thus, the proper reconstruction of each pixel along the and a vertical strip portion 102 having a single pixel 

same line within a strip depends on the proper recon- Mock width (Le., L =32), the entire video frame is 

struction of the previous pixel along that line. Applying refreshed with intra-frame coded video data every 32 

a new predictor within each strip effectively limits the frame periods provided all the frames are coded using 

propagation of decoded errors along a line to the strip 50 the composite intra/inter-frame coding mode. In the 

itself. case of Af// and Af^/, whose frame resolution is 

Another preferred method for generating an intra- 256X256 pixels, L=16, and the entire video frame is 

frame mode coded difference frame Af/ requires first refreshed every 16 frame periods, 

dividing the present video frame f/ into PxP pixel Referring to FIG. 6, a resolution-layered encoding 

blocks. The average value of each pixel block is then 55 techm'que for transmitting an intra-frame, or intra/inter- 

determined and subtracted from each pixel within the frame, encoded video difference frame Af/ across the 

corresponding block, thereby producing a difference network 12 (FIG. 1), from a source network interface 

frame Af/. Each block average is DPCM coded and 18 (FIG. 1) to a display network interface 20 (FIG. 1), 

transmitted with its corresponding pixel block inforroa- separates the digital video data into layers based on the 

tion so that video frame f/may be reconstructed from 60 video resolution information present in the data. In 

Af/ at the receiver. Yet another preferred method of layered coding, the digital video information is divided 

producing a difference frame Af/ is to use the boundary into several layers, with the lower layers containing low 

pixels of the previously decoded blocks to estimate the resolution information, and the higher layers containing 

average pixel value of the next pixel block to be de- . higher resolution information. The layers are priority 

coded, thus avoiding the need to transmit the average 65 ordered in descending order of importance to image 

pixel value for each pixel block. reconstruction, with the lower resolution layers, which 

Referring to FIG. 5(a), a composite intra/inter-frame carry the most important basic picture information, 

mode coded difference frame Af/ 100 is generated by having a higher priority than the higher resolution lay- 
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ers, which carry the less important fine detail picture video difference frame Af/ 100 (FIG. 5) are transmitted 
^formation. first, followed by the medium resolution, layer 2, coeffi- 

Such a layered coding model allows integration of cients of subset 204 corresponding to all the pixel 
video telephony, broadcast-quality video, and high- blocks, and so on. The receiver reconstructs the digital 
quality high definition video (HDTV/ATV) services. 5 video difference frame Af;by first using the low resolu- 
For instance, a video frame encoded using the layering " tion; layer 1, DCT coefficients of subset 202 tt> supply 
model shown in FIG. 6 has K layers of coded data, with the basic elements of the image, and later adding detail 
layer 1 containing the lowest resolution data, and layer to the image with the higher resolution DCT coefficient 
K containing the highest resolution data. Video tele- layers subsequently received, 
phony service only requires the data from the lowest 10 Each coefficient subset of the DCT coefficient block 
resolution layers, perhaps only layer L Broadcast-qual- 200 of FIG. 7 may be accessed by zig-zag scanning the 
iry video service requires the data from low and me- DCT coefficient block starting in the uppermost left- 
dium resolution layers. HDTV service requires the data hand corner, i.c, following the sequence 1, 2, 3, 4, 5, 6, 
from all resolution layers 1 through K. A video frame etc. shown in FIG. 7. First the low resolution, layer 1, 
encoded with all K layers may be reconstructed by all 15 DCT coefficients of subset 202 are scanned, then the 
the video services, with each service using only the data medium resolution, layer 2, coefficients of subset 204, 
layers it requires to reconstruct a full quality video the mediunHhigh resolution, layer 3, coefficients of 
image for that service. Analogously, a video service subset 206, and finally the high resolution, layer 4, coef- 
need only generate, transmit and receive the number of ficients of subset 208. The very-high resolution coeffici- 
layers required for that service. Thus, in particular 20 ents of subset 210 are discarded, 
video services where bandwidth is at a premium, the Transmission priority, to a large part, is determined 
lower layers can provide the video quality required by by the relative importance of the transmitted informa- 
the service at a video data bit rate that is compatible tion to reconstructing the video image at the receiver 
with the video service receivers. The low resolution, layer 1, coefficients are given the 

Referring to FIG. 7, each P X P pixel block 104 or 106 25 highest priority among the data layers since this layer 
(FIG. 5) of the digital video difference frame Af/ 100 typically contains the highest energy DCT coefficients 
(FIG, 5) is layer coded by first applying a discrete co- which correspond to the basic picture elements. With- 
sine transform (DCT) algorithm to the video data in the out proper reconstruction of the basic picture elements 
pixel block 104 or 106 to produce a corresponding P x P conveyed by the layer 1 data, the information conveyed 
block of DCT coefficients 200 (FIG. 7). The DCT 30 by the other data layers may be meaningless to image 
coefficient block 200 includes subsets 202, 204, 206, 208 reconstruction. The low resolution, layer 1, coefficients 
and 210 of coefficients which correspond respectively are transmitted first, followed by the medium resolu- 
to the low, medium, medium-high, high and very-high tion, layer 2, coefficients, the medium-high resolution, 
resolution video data present in the pixel block. The layer 3, coefficients, and then the high resolution, layer 
layered coding model of FIG. 6 is implemented by 35 4, coefficients. Thus, different network data transmis- 
assigmng each subset of DCT coefficients to the data sion priority levels can be assigned data from different 
layer corresponding to the resolution appropriate for layers, with the highest priority assigned to lowest reso- 
the DCT coefficients in the subset. lution coefficient layer and synchronization overhead 

In a preferred embodiment shown in FIG. 7, where information, and the lowest priority to the highest reso- 
P= 16, i.e., DCT coefficient block 200 is a 16x 16 block 40 lution coefficient layer, such that in the event of net- 
with 256 DCT coefficients, the low (layer 1), medium work congestion, low priority cells can be discarded 
(layer 2), medium-high (layer 3) and high resolution without seriously affecting the video quality at the re- 
(layer 4) layers respectively contain the 21 DCT coefifi- ceiver. 

cients of subset 202, the 24 coefficients of subset 204, the Data compression of the video information is en- 
46 coefficients of subset 206, and the 60 coefficients of 45 hanced by discarding information of little or no conse- 
subset 208. The remaining 105 DCT coefficients of quence to the video quality, e.g., the low energy very- 
subset 210 contain very-high resolution information, high resolution DCT coefficients of subset 210. Fur- 
typically contain relatively little energy, and may be thermore, if the total energy content of any DCT coeffi- 
discarded without substantially affecting the image cient layer is small, that layer, and the layers above it, 
quality of the highest resolution video modes supported 50 may also be discarded and not transmitted. Discarding 
by the network. Although FIG. 7 shows DCT coeffici- multiple layers of DCT coefficients with little or no 
ent block 200 divided into parallel looking, weU-struc- image degradation results in significant data bandwidth 
tured coefficient layers (i.e., DCT coefficient subsets savings for each frame of video data transmitted across 
each corresponding to a different data layer) it is feasi- the network 

ble to define coefficient layers in any shape and size by 55 Referring to FIG. 9, an Asynchronous Transfer 
individually assigning each DCT coefficient into any Mode (ATM) data packet structure, or cell, 400 for 
layer. For instance, FIG. 8 shows another 16x 16 DCT transferring packets of digital information over network 
coefficient block having coefficient layers defined by 12 (FIG, 1) includes a header field 402 and an informa- 
assigning each DCT coefficient to a specific layer, with tion field 404. The header field 402 typically includes, 
the numeral indicating which layer each DCT coeffici- 60 among other data, network level routing information 
ent is assigned to. DCT coefficient assignments remain <i e-, virtual channel identifiers, or VCIs) used to estab- 
fixed for the duration of the coding. fish a virtual channel between devices across the net- 

Referring again to FIG. 7, the DCT coefficient layers work. Information field 404 typically includes the data 
are transmitted over the network in a manner that ena- to be transferred between devices. Information field 404 
bles progressively improving reconstruction of decoded 65 also typically includes an adaptation overhead sub-field 
video frames at the receiver. For example, the low 405 which carries information related to the protocol 
resolution, layer 1, DCT coefficients of subset 202 cor- with which the data in the remainder of information 
responding to all the pixel blocks 104, 106 of the digital field 404 is encoded. 
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Two types of ATM digital video cell structures 403 ficients have been included for each pixel block of the 

(Type I) and 407 (Type II), corresponding to ATM cell video frame, i.e., the resolution of each pixel block, 

structure 400, are used for transmitting a digital video By allocating more bits to resolution information 

difference frame Aft across the network. (It is assumed subfield 420, it is possible to specify which coefficient 

that the frame resolution of Afy,- is 512X512 pixels, 5 layers have been kept and transmitted, and which layers 

having 1024 16X 16 pixel/coefficient blocks, and that have-been discarded, on a per block' basis: The more - 

the frame resolution of Afr and Afg, each are 256x256 specific the information is about which layers are being 

pixels, having 256 16x 16 pixel/coefficient blocks.) transmitted, the higher the data compression becomes. 

Type I and Type II cell structures have the same header For instance, 256 bits may be allocated to the resolution 

field structure 402, but have different information field 10 information subfield for an I- or Q-component frame, 

structures 404, with the Type I cell information field with * single bit representing each pixel block's resolu- 

taking the form of information field structure 406, and i» e *» one bit corresponding to each of the 256 pixel 

Type II cell information field taking the form of infor- blocks. In this case, the single bit per block may repre- 

ination field structure 408. Both information field struc- whether there is sufficient information present in 

tures 406 and 408 have a firet subfield 410 defining a cell 15 coefficient layer of the corresponding block, and 

sequence number indicating the relative position of the thereby whether any coefficients for that block will be 

current cell with respect to other transmitted cells. Both transmitted. If there is little or no information in the 

information field structures 406 and 408 also have a block, then the entire block is discarded. If there is 

second subfield 412 defining a sync flag whose state sufficient information in any coefficient layer in the 

determines the characteristics of the remainder of the 20 Wock » then all four coefficient layers of the block are 

information field. If the sync flag is set (logic high) then transmitted. Similarly, where 1,024 bits are allocated to 

the cell is of Type I and information field structure 406 the resolution information subfield. the same quantity of 

defines the remainder of the field, and if the sync flag is resolution information is available for each pixel block 

clear (logic low) then the cell is of Type II and informa- of a Y ^ om P° n ent frame, i.e., one bit corresponding to 

tion field structure 408 defines the remainder of the 25 "S? of 1 ? ie 1 ' 024 i blocks - 

field. ™ r tnese resolution information subfield structures, 
Type I cells 403 carry frame sync and code specific a Type I cell would typically be re- 
information in the adaptation overhead section 405 J. 0 , cai T y Y -c°niponent sync information 
(shown by the dashed lines extending from ATM cell ln ( \? 24 k ^ T & i** 1 * 6 } Cel1 would typi * 
structure 400) of its information field 406. An adaptation 30 Cally be * ble ? ca £ y of *l ther *** l ' or Q-component 
overhead section 405 may be larger than information W*"^ < 25 * bits). It is expected that the Type 
field 406 of a single Type I cell, » which case the adaj, ov^head formation would be 
tation overhead is distributed across as many Type I S^T^? > * * ne^ork pnonty to avoid any 
cells as required. Type I cells carrying adaptation oVer- „ k^k t T"* ^ cells l wlll / ever be 

head section 405 are sent only ones at the beginning of 35 u ** SynC ^ SVi^J 1 ?^. ft 

each new video frame ^maing oi wou3d be necessary t0 remove ^ field which carries 

Type II cells carry the remainder of the video frame 7* w ^ 

information, i.e., DCT coefficients, packed into the resoluuon) of all the 

information fields 408 of as many Type II cells as re- An tu ' u , - * * . 

quired. Fixed length code words are forthe DCT J* ab0Ve TTk'T ^P 1 ^.- 

coefficients of eafh layer within each^ck and there- SZSSS^i^ ^ ^T" M ° T ' 

iu„ ... , . ,r , v« <um uiwc mation subfield sue of I- and O-components the same as 

n£ds to ffE taft? fflT? fZF*™ *" for the V-comptw*. For tesL.ce, allocaTgl 
needs to be included ui the information field 408 over- li024 bit resolution information subfield for 1- and Q- 

u „ f • . ,, .... . . ...» . 45 components increases the resolution information from a 

T%^£££^£F^a'"?T ° one bit t0 four bits «»™P«idmg to each pixel block. 
£? IS™ (Th a- ' a & Four Wts P" block <*« reada y which of the 

head section mcludes a codmg-mode subfield 414 a four DCT coefficient layers from each corresponding 

^ n « 0 yP ! SUbf ! e " * verocal -stnp-location Mock ^ nol ^ coded becau ^. of insufficient energy 

DCT coefficient data subfield 422, outside adaptation the same quantity of resolution information about the 
™H^.?TJ£ n T T? 6 I f» tater . 0f Y-component frame as now being conveyed about the 

ESftS ff^.n Z toqmMx of adaptation over- T. Md component frames). Thus, only those layers 
head data is smaller than the size of information field wit h sufficient energy content are selectively transmit- 
406. More specifically, ^coding-inode subfield 414 typi- 55 ted, which enables individual specification of whether 
cally contains a angle bit which identifies which of the to drop or keep each layer in each block. Although 
encoding modes was used to encode the video frame. overhead apparently increases, there is a significant 
i.e., either the intra-frame or composite intra/inter- reduction in the total number of coded bits generated on 
frame encoding mode. Component-type subfield 416 * per frame basis, resulting in higher compression effi- 
typically contains 2 bits which identify the video color 60 ciency. 

component type to which the current video frame data A fixed number of bits may be assigned to the coeffi- 
corresponds, i.e., the Y, I, or Q-video component frame. cients of each layer, which may then be entropy en- 
Vertic^stnp-location subfield 418 typically contains 0, coded to further increase the compression. However, 
4, or 5 bits which identify the block column location of employing variable codeword entropy coding would 
the vertical strip portion 102 (FIG. 5) of intra-frame 65 adversely impact the resynchronization of video in the 
coded data for a composite intra/inter-frame coded event of cell loss unless resynchronization flags are 
video frame. Resolution information subfield 420 con- inserted in the cell adaptation overhead, which may 
tains data that indicates how many layers of DCT coef- then consist of such information as the cell sequence 
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number, the color component, the line number, and the 
location of the first complete code word in the cell 
information field and its spatial location in the frame. A 
number of possibilities exist for the exact adaptation 
overhead structure. 5 
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Experimental Results 

Referring to FIGS. 10(a) and 10(6), experiments sim- 
ulating cell loss scenarios and reduced resolution recon- 
struction were performed on a multi-frame video se- 10 
quesce comprising 512x512 pixel Y-component video 
frames. FIGS. 10(a) and 10(b) are original frames from 
the multi-frame sequence chosen by the encoding selec- 
tor of this invention for encoding using the intra-frame 
coding mode and the composite intra/inter-frame cod- 15 
ing mode, respectively. In the experiments, the encod- 
ing selector weighting coefficients ki, and threshold 
parameter T were chosen to be 0.2, 0.8, and 64 respec- 
tively. These coefficient and parameter values are sub- 
ject to further optimization through additional tests on 20 
a variety of video sequences. In all the experimental 
images presented herein, a DCT coefficient block size 
of 16 X 16 was chosen, four DCT coefficient layers were 
defined as shown in FIG. 7, and 6, 5, 4, and 3 bits were 
assigned to represent each DCT coefficient of layers 1, 25 
2, 3, and 4, respectively. In all the results shown pertain- 
ing to the composite intra/inter-frame coding mode, the 
intra-frame vertical strip portion 102 of FIG. 5 was 
chosen to have a 16 pixel width and was positioned in 
the middle of the video frame. 30 

Referring to FIGS. ll(a)-llW)» the intra-frame mode 
coded video frame of FIG. 10(a) was decoded using 
each of four resolution specifications respectively de- 
fined as: (a) low resolution, decoding only layer 1 coef- 
ficients; (b) medium resolution, decoding layers 1 and 2 35 
coefficients; (c) medium-high resolution, decoding lay- 
ers 1, 2, and 3 coefficients; and (d) high resolution, 
decoding layers 1, 2. 3, and 4 coefficients. Similarly, 
FIGS. 12(<j)-12(<j) show the intra/inter-frame mode 
coded video frame of FIG. 10(b) decoded using each of 40 
these four resolution specifications, respectively. In 
cither case, no cells were lost during transmission. 

Referring to FIGS. 13(a)-13(rfj, there is shown the 
effect of losing $% of the transmitted cells (51 blocks 
selected at random), simulating an extreme cell loss 45 
condition, within various coefficient layers for the intra- 
frame mode coded image of FIG. 10(a) and the in- 
tra/inter-frame mode coded image of FIG. 10(6). FIG. 
13(a) shows the effect on the intra-frame mode coded 
image when the lost cells correspond to coefficient 50 
layers other than the lowest resolution layer, layer 1. 
FIG. 13(c) shows the effect on the intra/inter-frame 
mode coded image experiencing the same type of cell 
loss as that of FIG. 13(a). 

As a simulated "worst case" scenario, FIG. 13(6) 55 
shows the effect on the intra-frame mode coded image 
when the lost cells correspond to the lowest resolution 
coefficient layer, layer 1. FIG. 13(d) shows the effect on 
the intra/inter-frame mode coded image experiencing 
the same type of low resolution layer cell loss as that of 60 
FIG. 13(6). This type of layer 1 cell loss simulates a 
"worst case" since the layer 1 coefficients are typically 
sent across the network with the highest network trans- 
mission priority, and are therefore seldom lost Further, 
since each cell typically carries a 3 or 4 bit cell sequence 65 
number, the loss of cells is easily determined. Once the 
loss of a cell is determined, other error concealment 
techniques, e.g., pixel replication from the previous 



frame, or pixel interpolation from the present frame, 
etc., can be employed to reduce the detrimental effect 
of the cell loss on the reconstructed video frame. 

Referring to FIG. 14, Tables I and II show the signal- 
to-noise ratio (S/N) and bit rates (bpp) calculated for 
the intra-frame mode encoded image of FIG. 10(a)"and 
the composite intra/inter-frame mode encoded image of 
FIG. 10(6), respectively, for each of the four resolution 
specifications defined above, with and without cell loss. 
Cell loss was simulated by random block loss as de- 
scribed above. An assumed overall cell loss rate of 
1 X 10- 9 was used for the calculations, and was applied 
uniformly across all coefficient layers notwithstanding 
assumed transmission priority differences among the 
different layers. For a cell size of 53 octets with 48 
information octets (proposed ATM B-ISDN standard), 
and a 512x512 pixel video frame having a 16x 16 DCT 
coefficient block size, an average of about 1 x 10~ 5 
cells, corresponding to 1 X 10-« blocks, would be lost 
per frame. This translates to about one block in 1000 
frames, which is a much lower rate of loss than the 5% 
block loss of the cases studied. Further, it is very un- 
likely that cells corresponding to all coefficient layers of 
the same block will be lost, which implies that the actual 
qualitative and quantitative results achieved on the 
network will be significantly better than the experimen- 
tal results shown here. Thus, the robustness of the in- 
vention to a wide range of cell loss rates is significant. 

Another observation that may be made from the 
experimental results presented here is that as long as 
low-resolution layer 1 is present, there is negligible 
image quality loss in either the intra-frame or intra/int- 
er-frame mode coded images. Furthermore, in a motion 
video sequence, such defects will not typically be visi- 
ble unless the sequence consists of a stationary scene. 
Experimental results indicate that coding rates in the 
range of 0.4 to 2.25 bpp at S/N ratios in the range of 30 
to 50 dB are easily obtained, and these figures vary from 
frame-to-frame and sequence-to-sequence. 

Other embodiments are within the scope of the fol- 
lowing claims. 

I claim: 

1. An apparatus for encoding a digital video frame f/, 
having a matrix with N rows and M columns of pixels 
Ax/(n,m), for transmission over a digital communica- 
tions channel, comprising 
an inter-frame encoder for encoding a said pixel 
xXn,m) of said video frame f/ into a corresponding 
differentially encoded pixel Ax^n^m) = xXn,m) - x- 
<n,m) of a difference frame Af/ dependent on the 
corresponding pixel x/_i(n,m) of a previous video 
frame f/_ i, 

an intra-frame encoder for encoding a said pixel 
x/(n,m) of frame f/ into a corresponding differen- 
tially encoded pixel AxXn.m) of said difference 
frame Afj<n,m) dependent on other said pixels of 
the same said video frame f/, and 

an encoding selector for selecting between said inter- 
frame encoder and said intra-frame encoder for 
differentially encoding pixels x/(n,m) of frame ff 
into corresponding differentially encoded pixels 
Axi(n,m) of said difference frame Af/, said encoding 
selector being responsive to the relative motion 
between said video frame f/and said previous video 
frame f/_ i, 

said coding selector further including a motion detec- 
tor for detecting the relative motion between said 
video frame f/and said previous vide frame f/_i, 
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and for providing a decision parameter K repre- 
senting the level of detected relative motion, K 
being computed by 

where Ax7 and o* 2 are an estimate of the mean and 
variance, respectively, of difference pixel value 
Ax/(n,m), kl and k2 are weighing coefficients each 
having a value between 0 and 1 such that io 
ki+k 2 «l, 

a layered resolution encoder for encoding said differ- 
ential pixels Ax<n,m) of frame Af/ into a plurality of 
separable data sets, each said data set representing 
video information, within a particular range of 15 
video image resolution, about said differential pix- 
els AxXn,m), 

a packetizer for formatting said plurality of data sets, 
into at least one asynchronous transfer mode 
(ATM) packet for transmission over the digital 20 
communications channel, said ATM packet com- 
prising a header field portion having data for estab- 
lishing a virtual communications channel between 
selected devices on the digital communications 
channel, and an information field portion for trans- 25 
f erring said plurality of data sets between said se- 
lected devices, 

wherein said encoding selector responds to decision 
parameter K by selecting said inter-frame encoder 
when K<T and by selecting said intra-frame en- 30 
coder when KST, where T represents a threshold 
parameter T having a value between 0 and 1. 

2. The apparatus of claim 1, wherein 

said estimate of means difference pixel value Ax7 is 
computed by 35 



16 



and, said estimate of means difference pixel van 
ance o* 2 is computed by 



. 40 



45 



where Ax/(j) is a short notation for J randomly 
selected pixels Ax/(m,n) of difference frame Af,. 

3. The apparatus of claim 1 wherein said encoding 
selector comprises 50 

a composite frame combiner for providing a differen- 
tially encoded difference frame Af/ having a first set 
of difference pixels Ax/(m,n) encoded by said inter- 
frame encoder, and a second set of pixels Ax/(m,n) 
encoded by said intra-frame encoder. 55 

4. The apparatus of claim 3 wherein said second set of 
pixels Axj(m,n) comprises 

at least one square pixel block haying P pixels 
Ax<m,n) on each side, where N/P, and M/P are 
integer values. 60 

5. The apparatus of claim 4 wherein said second set of 
pixels Ax/(m,n) comprises 

a vertical strip portion having a width of P pixels and 
a length of N pixels such that said vertical strip 
portion comprises a quantity N/P of said pixel 65 
blocks arranged vertically aligned and non-over- 
lapping. 

6. The apparatus of claim 5 wherein 



said vertical strip portion of said second set of pixels 
Ax/+i(m,n) of frame fj+i is offset horizontally in 
position from said vertical strip portion of said 
second set of pixels Ax,<m,n) of frame f/by at least 
P said pixels. _ _ _ _ 

7. The apparatus of claim 4 wherein said second set of 
pixels AxXm.n) comprises 

a horizontal strip portion having a width of M pixels 
and a length of P pixels such that said horizontal 
strip portion comprises a quantity M/P of said pixel 
blocks arranged horizontally aligned and non-over- 
lapping. 

8. The apparatus of claim 7 wherein 

said horizontal strip portion of said second set of 
pixels Ax,+i(m,n) of frame f,+i is offset vertically 
in position from said horizontal strip portion of said 
second set of pixels Ax/(m,n) of frame f/by at least 
P said pixels. 

9. The apparatus of claim 1 wherein 

said layered resolution encoder comprises a discrete 
cosine transform (DCT) for transforming said dif- 
ferential pixels Ax/(n,m) into a plurality of DCT 
coefficients representing said differential pixels, 
said DCT coefficients separable into said plurality 
of said data sets providing coefficient layers. 

10. The apparatus of claim 9 wherein 

said difference frame Af,- comprises a plurality of 
square pixel blocks having P differential pixels 
Ax/(m,n) on each side, where N/P, and M/P are 
integer values, and 

said layered resolution encoder transforms said differ- 
ential pixels within each said pixel block into a 
corresponding square coefficient block of DCT 
coefficients, having P coefficients on each side, 
representing video information about said differen- 
tial pixels AxXn,m) within said corresponding pixel 
block. 

11. The apparatus of claim 11 wherein said DCT 
coefficients within each said coefficient block are sepa- 
rated into said coefficient layers. 

12. The apparatus of claim 1 wherein said information 
field portion comprises an adaptation overhead field 
portion comprising 

a cell sequence number for indicating the temporal 
relationship of said ATM packet relative to other 
said ATM packets, and 

a sync flag taking one of a plurality of states for indi- 
cating the composition of the remainder of said 
information field portion, said sync flag taking a 
first state when said information field portion in- 
cludes a first type of information field, and a second 
state when said information field portion includes a 
second type of information field. 

13. The apparatus of claim 12 wherein said first type 
of information field comprises an adaptation overhead 
field portion further comprising 

a coding mode field for indicating whether said data 
sets are encoded in said inter-frame or said intra- 
frame coding mode, and 

a component type field for indicating to which said 
color video component said data sets belong. 

14. The apparatus of claim 12 wherein 

said difference frame Af/ comprises a plurality of 
square pixel blocks having P differential pixels 
Ax f <m,n) on each side, where N/P, and M/P are 
integer values, 

said layered resolution encoder comprises a discrete 
cosine transform (DCT) for transforming said dif- 
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ferential pixels &xfam) in each said pixel block i(n,m) dependent on other said pixels of the same 

into a corresponding block of DCT coefficients said video frame f/ 

representing said differentiaj^pixels in said corre- Uytr encoding said' differential pixels Ax^ni) of 

saidfimtypeofixiforaiationfieldcompri^ , about s«d ferential pixels AxXn t m) f 

tation overhead field portion further comprising a formatting said plurality of data sets, into at least one 

resolution information field for indicating which t0 asynchronous transfer mode (ATM) packet for 

said coefficient layers are available to be trans- transmission over the digital communications chan- 

ferred between said devices for each said coeffici- ne *» ATM packet comprising a header field 

ent block. portion having data for establishing a virtual com- 

15. The apparatus of claim 14 wherein said resolution miinications channel between selected devices on 
information field comprises 15 the digital communications channel, and an infor- 

a data bit corresponding to each said coefficient mation field portion for transferring said plurality 

block, each said data bit taking a first state when no of data sets between said selected devices, 

said coefficient layers are available to be trans- 18. The method of claim 17, wherein 

ferred for said corresponding coefficient block, and said estimate of mean difference pixel value Ax/ is 

each said data bit taking a second state when all 20 computed by 
said coefficient layers are available to be trans- 
ferred for said corresponding coefficient block. 

16. The apparatus of claim 14 wherein said resolution Ax,- = j, Axfayj 
information field comprises J- 1 

a block resolution field corresponding to each said 25 

coefficient block, each said block resolution field and .' ^ estimate of mean difference pixel variance 

taking one of a plurality of states for indicating 0,2 * computed by 
which said coefficient layers are available to be 

transferred for said corresponding coefficient „ / 

blOCk. =i ! (A*/0> - A*,) 2 /./ 

17. A method for encoding a digital video frame ft, 

having N rows and M columns of pixels xXn,m), f or whcrc Axi(j) fe a ^ notation for j Tsmdoml 

c™riZ7ihtsL£ of COmmumcatl0ns channeI > selected pixels Axrfm,n) of difference frame Af, 

sekctini betwe^i an inter-frame encoding mode and „ st ™! n ^ h ° d ° f 17 Wherein "* idectii * 

an intra-frame encoding mode for differentially 35 7^V^"f: a .. • t , . . . . 

encoding pixels x/(n,m) of a video frame f, into combining sad mter-frame and intra-frame encoding 

corresponding differentially encoded pixels mod " to P r ™ de a differentiaUy encoded differ- 

AxXn,m) of a difference frame Af/, said selection ence frame Af < havin g a first set of pixels AxXm,n) 

being determined by the relative motion between encoded by said mter-frame encoding mode, and a 

said video frame f/and a previous video frame f/_ i, 40 second set of pixels AxKm,n) encoded by said intra- 

said step of selecting between an mter-frame encod- frame encoding mode, 

ing mode and an intra-frame encoding mode fur- The method of claim 19 wherein said second set of 

ther including detecting the relative motion be- pixels Ax/(m,n) comprises 

tween said video frame f/ and said previous video at least one square pixel block having P pixels 

frame f/_i, and computing a decision parameter K 45 Ax<m,n) on each side, where N/P, and M/P are 

representing the level of detected relative motion, integer values. 

K being computed by 21. The method of claim 20 wherein said second set of 

pixels Ax/(m,n) comprises 

~ 1 ' + 2<r 50 a vertical strip portion having a width of P pixels and 

where AxTand <r* are an estimate of the mean and * ^ ° f N ^ SUCh J*? V f tical Strip 

variance, respectively, of difference pixelTalues f?"™ ^TT^T*? ^ °l ^ 

Ax<n,m), kl and k2 are weighting coefficients each ***** vcrtlcally and non-over- 

having a value between 0 and 1 such that « £[ m8, . . ^ 

ki +k 2 = 1, selecting said inter-frame encoder when 55 22 J 1 * method of clauD 21 further compnsmg the 

K<T, selecting said intra-frame encoder when ste P of 

K ^ T, where T represents a threshold parameter T horizontally offsetting the position of said vertical 

having a value between 0 and 1, stri P portion of said second set of pixels Ax/ + i(rn,n) 

if said inter-frame encoding mode is selected, then of frame f/+ 1 from the position of said vertical strip 

encoding a said pixel Xi(n,m) of frame f, into a cor- 60 portion of said second set of pixels AxXnvO of 

responding differentially encoded pixel AxXn,m) of toamt ft by at least P said pixels, 

said difference frame Aft, dependent on the corre- 23* The method of claim 20 wherein said second set of 

sponding pixel x/-i(iun) of said previous vide pixels Ax/(m,n) comprises 

frame fc_i, and a horizontal strip portion having a width of M pixels 

if said intra-frame encoding mode is selected, then 65 and a length of P pixels such that said horizontal 

encoding a said pixel x/(n,m) of said frame f/ into a strip portion comprises a quantity M/P of said pixel 

corresponding differentially encoded pixel blocks arranged horizontally aligned and non-over- 

AxXn,m) computed by AxKn,m)=xKn,m)-x- lapping. 



08/08/2003, 



EAST Version: 



1.04.0000 



19 



5,260,783 



20 



24. The method of claim 23 further comprising the 
step of 

vertically offsetting the position of said horizontal 
strip portion of said second set of pixels Ax/+ i(m,n) 

of frame ft+jjrom the position of said horizontal 5_ 

strip portion of said second set of pixels AxXm.n j of 
frame ft by at least P said pixels. 

25. The method of claim 17 wherein said layer encod- 
ing step comprises 

transforming said differential pixels Axj(n f m) into a 10 
plurality of discrete cosine transform (DCT) coeffi- 
cients representing said differential pixels* and 

separating said DCT coefficients into said plurality of 
said data sets providing coefficient layers. 

26. The method of claim 25 wherein lis 
said difference frame Af/ comprises a plurality of 

square pixel blocks having P differential pixels 
Ax/(m,n) on each side, where N/P, and M/P are 
integer values, and 
said transforming step transforms said differential 20 
pixels within each said pixel block into a corre- 
sponding square coefficient block of DCT coeffici- 
ents, having P coefficients on each side, represent- 
ing video information about said differential pixels 
Ax,<n,m) within said corresponding pixel block. 25 

27. The method of claim 33 further comprising the 
step of separating said DCT coefficients within each 
said coefficient block into said coefficient layers. 

28. The method of claim 17 wherein said information 
field portion comprises an adaptation overhead field 30 
portion comprising 

a cell sequence number for indicating the temporal 
relationship of said ATM packet relative to other 
said ATM packets, and 

a sync flag taking one of a plurality of states for indi- 35 
eating the composition of the remainder of said 
information field portion, said sync flag taking a 
first state when said information field portion in- 
cludes a first type of information field, and a second 
state when said information field portion includes a 40 
second type of information field. 

29. The method of claim 28 wherein said first type of 
information field comprises an adaptation overhead 
field portion further comprising 

a coding mode field for indicating whether said data 45 
sets are encoded in said inter-frame or said intra- 
frame coding mode, and 

a component type field for indicating to which said 
color video component said data sets belong. 

30. The method of claim 28 wherein 50 
said difference frame Af/ comprises a plurality of 

square pixel blocks having P differential pixels 
Ax<m,n) on each side, where N/P, and M/P are 
integer values, 

said layer encoding step comprises transforming said 55 
differential pixels Ax/fam) in each said pixel block 
into a corresponding block of discrete cosine trans- 
form (DCT) coefficients representing said differen- 
tial pixels in said corresponding pixel block, and 
separating said DCT coefficients into said plurality 60 
of said data sets providing coefficient layers for 
each said DCT coefficient block, and 

wherein said first type of information field comprises 
an adaptation overhead field portion further com- 
prising a resolution information field for indicating 65 
which said coefficient layers are available to be 
transferred between said devices for each said coef- 
ficient block. 



31. The method of claim 20 wherein said resolution 
information field comprises 

a data bit corresponding to each said coefficient 
block, each said data bit taking a first state when no 

said coefficient layers are available, to be trans^.. 

ferred for said corresponding coefficient block, and 
each said data bit taking a second state when all 
said coefficient layers are available to be trans- 
ferred for said corresponding coefficient block. 

32. The method of claim 30 wherein said resolution 
information field comprises 

a block resolution field corresponding to each said 
coefficient block, each said block resolution field 
taking one of a plurality of states for indicating 
which said coefficient layers are available to be 
transferred for said corresponding coefficient 
block. 

33. An apparatus for transferring a digital video 
frame f/, having N rows and M columns of pixels 
xj(n,m), for transmission over a digital communications 
channel, comprising 

an inter-frame encoder for encoding a said pixel 
x/(n,m) of a video frame ft into a corresponding 
differentially encoded pixel Ax,(n,m) of a differ- 
ence frame Af? dependent on the corresponding 
pixel x,-i(n,m) of a previous video frame ft_i, 
an intra-frame encoder for encoding a said pixel 
x/(n,m) of a video frame ft into a corresponding 
differentially encoded pixel Ax/(n,m) dependent on 
other pixels of the same said frame f/, 
an encoding selector for selecting between said inter- 
frame encoder and said intra-frame encoder for 
differentially encoding pixels Ax/(n,m) of said 
frame f/ into corresponding differentially encoded 
pixels Ax/(n,m) of said frame Aft, said encoding 
selector being responsive to the relative motion 
between said video frame ft and said previous vide 
frame ft_], and 
a layered resolution encoder for encoding said differ- 
ential pixels AxXn,m) of frames Aft into a plurality 
of separable data sets, each said data set represent- 
ing video information, within a particular range of 
video image resolution, about said differential pix- 
els Ax/(n,m), 

a packetizer for formatting said plurality of data sets 
into at least one asynchronous transfer mode 
(ATM) packet for transmission over the digital 
communications channel, said ATM packet com- 
prising a header field portion having data for estab- 
lishing a virtual communications channel between 
selected devices on the digital communications 
channel, and an information field portion for trans- 
ferring said plurality of data sets between said se- 
lected devices, 
said information field portion having an adaptation 
overhead field portion including a cell sequence 
number for indicating the temporal relationship of 
said ATM packet relative to other said ATM pack- 
ets, and 

a 6ync flag taking one of a plurality of states for indi- 
cating the composition of the remainder of said 
information field portion, said sync flag taking a 
first state when said information field portion in- 
cludes a first type of information field, and a second 
state when said information field portion includes a 
second type of information field, 
said encoding selector comprises a composite frame 
combiner for providing a differentially encoded 
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difference frame Af/ having a first set of pixels 
Ax/(n,m) encoded by said inter-frame encoder, and 
a second set of pixels Ax/(n f m) encoded by said 
intra-frame encoder, and 5 
.said -first- type_of information -field comprises and 



22 

adaptation overhead field portion further compris- 
ing 

a strip location field for indicating the location of said 
second set of pixels relative to said first set of pixels 

within said encoded difference frame Af/. 

_ _ * * *... _ _ _ — _ 
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